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Abstract. A minimax tree is similar to a Huffman tree except that, 
instead of minimizing the weighted average of the leaves' depths, it min- 
imizes the maximum of any leaf's weight plus its depth. Golumbic (1976) 
introduced minimax trees and gave a Huffman-like, 0(71 log n)-time al- 
gorithm for building them. Drmota and Szpankowski (2002) gave an- 
other 0(71 log n)-time algorithm, which checks the Kraft Inequality in 
each step of a binary search. In this paper we show how Drmota and 
Szpankowski's algorithm can be made to run in linear time on a word 
RAM with i7(logn)-bit words. We also discuss how our solution applies 
to problems in data compression, group testing and circuit design. 



1 Introduction 

In a minimax tree for a multiset W = {wi, . . . , w„} of weights, each leaf has a 
weight Wi, each internal node has weight equal to the maximum of its children's 
weights plus 1, and the weight of the root is as small as possible. In other words, 
if ii is the depth of the leaf with weight Wi, then ma,Xi{wi-\-£i} is minimized. The 
weight of the root is called the minimax cost of W, denoted M{W). Golumbic [17] 
showed that if wc modify Huffman's algorithm [20] to repeatedly replace the two 
nodes with smallest weights Wi and Wj , by a node with weight inax{wi ,Wj) + l 
instead of Wi +Wj, then it builds a minimax tree instead of a Huffman tree. Like 
Huffman's algorithm, it takes O{nlogn) time and can build trees of any degree. 
Our results in this paper also generalize to higher degrees and larger code al- 
phabets but, for the sake of simplicity, we henceforth consider only binary trees 
and alphabets. Golumbic, Parker [29] and Hoover, Klawe and Pippenger [18] 
showed how to use Golumbic 's algorithm to restrict circuits' fan-in and fan-out 
without greatly increasing their sizes or depths. Drmota and Szpankowski [9, 
10] pointed out that, if P = pi, . . . ,p„ is a probability distribution and each 
Wi = log(l/pi), then a minimax tree for W is the code-tree for a prefix code with 
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minimum maximum pointwise redundancy with respect to P. (As we are consid- 
ering only binary trees in this paper, by log we always mean log2.) They gave an- 
other C(n log n)-time algorithm for building minimax trees and, by analyzing it, 
proved bounds on the redundancy of arithmetic coding, which Baer [3] recently 
improved by analyzing Golumbic's algorithm. Drmota and Szpankowski start 
with a Shannon code [30] for P, in which the codeword for the ith character has 
length [log(l/pi)], for each i; they sort the logarithms by their fractional parts, 
i.e., log(l/pi) - [log(l/pi)J, . . . ,log(l/p„) - Llog(l/p„)J; and they use binary 
search to find the largest value x such that [log(l/pi) — x], . . . , [log(l/p„) — x] 
obey the Kraft Inequality [27]. In a previous paper [14] (see also [15,21]) we 
noted that minimax trees built with Golumbic's algorithm have the same Sib- 
ling Property [11,16] as Huffman trees, and turned the Faller-Gallager-Knuth 
algorithm [26] for dynamic Huffman coding into an algorithm for dynamic Shan- 
non coding. Intriguingly, although static Huffman coding is optimal and static 
Shannon coding is not, dynamic Shannon coding has a better worst-case bound 
than dynamic Huffman coding does. 

Hu, Kleitman and Tamaki [19] gave an C'(nlogn)-time algorithm for build- 
ing alphabetic minimax trees, in which the leaves' weights, from left to right, 
must be in the given order. Kirkpatrick and Klawe [23] and Coppersmith, Klawe 
and Pippcngcr [6] gave an algorithm (or, more precisely, two algorithms that 
are equivalent when trees are binary) that builds an alphabetic minimax; tree for 
integer weights in 0{n) time, and showed how to use it to restrict circuits' fan-in 
and fan-out without greatly increasing their sizes or depths and without chang- 
ing the numbers of edge crossings (and, thus, preserving planarity). Kirkpatrick 
and Klawe also showed how to combine their algorithm with binary search in 
order to build alphabet minimax trees for real weights in C>(n log n) time. We 
note that, if their algorithm for integer weights is viewed as an alphabetic ana- 
logue of the Kraft Inequality — as it was by Yeung [32] and Nakatsu [28], who 
independently rediscovered it — then their algorithm for real weights is the al- 
phabetic analogue of Drmota and Szpankowski's. Kirkpatrick and Przytycka [24] 
gave an O(logn)-timc, 0{n/ logn)-processor algorithm for integer weights in the 
CREW PRAM model. In another previous paper [13] we used a data structure 
due to Kirkpatrick and Przytycka and a technique for generalized selection due 
to Klawe and Mumey [25], to make Kirkpatrick and Klawe's algorithm for real 
weights run in 0[n min(log n, d log log n)) time, where d is the number of distinct 
values \wi]. In this paper we prove a conjecture we made then, that a similar 
modification can make Drmota and Szpankowski's algorithm run in 0{n) time. 

2 Applications 

In the full version of this paper, we will consider all of the following problems: 

A. build a prefix code with minimum maximum pointwise redundancy; 

B. given a good estimate of the distribution over an alphabet, build a good 
prefix code; 



Minimax Trees in Linear Time 3 



C. given a good estimate of the distribution over a set, design a good group test 
to find the unique target; 

D. build a minimax tree for a multiset of real weights; 

E. build a Shannon code; 

F. build a tree whose leaves have at most given depths; 

G. restrict a circuit to have bounded fan-in or fan-out; 

H. build a minimax tree for a multiset of integer weights. 

The authors cited in the introduction have already shown, however, that Prob- 
lem A takes 0{n) more time than D, E than F, and F and G than H. Therefore, 
in the current version of this paper, we consider only Problems B, C, D and H. 
In the remainder of this section we define what we mean by "good" in Prob- 
lems B and C, and show they take 0{n) more time than D. Problems B and C 
are, in fact, equivalent to each other and to A, and analogous to a problem we 
considered in our paper [13] on building alphabetic minimax trees. In Section 3 
we give two 0(n)-time algorithms for Problem H. Finally, in Section 4 we show 
how to use either of those algorithms to obtain an algorithm for Problem D that 
takes 0{n) time on a word RAM with i7(logn)-bit words. It follows that all the 
problems listed above take 0{n) time. 

Suppose we want to build a good prefix code with which to compress a 
file, but we are given only a sample of its characters. Let P = pi,...,Pn be 
the normalized distribution of characters in the file, let Q = , . . . , g„ be the 
normalized distribution of characters in the sample and suppose our codewords 
arc C = ci, . . . , c„. An ideal code for Q assigns the ith character a codeword 
of length log(l/q'i) (which may not be an integer), and the average codeword's 
length using such a code is H{P) + D{P\\Q), where H{P) = J2iPi^'^s{^/Pi) is 
the entropy of P and D{P\\Q) = X^^Pi log(Pi/9i) is the relative entropy between 
P and Q. The entropy measures our expected surprise at a character drawn 
uniformly at random from the file, given P; the relative entropy (also known as 
the informational divergence or KuUback-Leibler pseudo-distance) measures the 
increase in our expected surprise when we estimate P by Q, and is often used 
to quantify how well Q approximates P (see, e.g., [8]). 

Consider the best worst-case bound we can achieve, given only Q, on how 
much the average codeword's length exceeds H{P)+D(P\\Q). A result by Katona 
and Nemetz [22] implies we do not generally achieve a constant bound on the 
difi^erence when C is a Huffman code for Q. (Given P, of course, the best bound 
we could achieve on how much the average codeword's length exceeds H{P), 
would be the redundancy of a Huffman code for P.) For example, if gi, . . . , g„ 
arc proportional to _F„, . . . , fi, where Fi denotes the ith Fibonacci number (i.e.. 
Fx = F2 = I and Fj = + i^j_2 for i > 3), then the codewords' lengths are 

I. . . . , n — 2, n — 1, n — 1 in any Huffman code for Q. If p„ is sufficiently close to 
1, then 



H{P)+D{P\\Q) 
w log(l/q'„) 
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n 



= n\og(j) + 0{l) 



but the average codeword's length X^iPikil ~ ti — 1, so for large n the difference 

is about (l/log(/) — l)n « 0.44n, where w 1.62 is the golden ratio. 
As long as > whenever > 0, the average codeword's length 



(if Qi = Q but Pi > Q for some i, then D{P\\Q) is infinite). Notice each |cj| is 
the length of a branch in the code-tree for C. Therefore, the best bound we can 
achieve is 



which is less than 1, by inspection of Drmota and Szpankowski's algorithm. (Re- 
call that M(log5i, . . . ,logg„) denotes the minimax cost of {logg'i, . . . jlogg^}, 
i.e., the weight of the root of a minimax tree for {loggi, . . . ,log(7„}.) Moreover, 
we achieve this bound when the code-tree for C has the same shape as a min- 
imax tree for {loggi, . . . ,logQ'„}. In other words. Problem B takes 0{n) more 
time than D. 

Now suppose we want to design a good group test (see, e.g., [1,2]) to find the 
unique target in a set, given only an estimate Q — presumably gained from past 
experience or experimentation - of the probability distribution P according to 
which the target is chosen. A group test allows us to choose, repeatedly, a subset 
of the elements and check whether the target is among them. We can represent 
a group test as a decision tree in which each leaf is labelled with an element 
and each internal node is labelled with the concatenation of its children's labels. 
Because such a decision tree can be viewed as the code-tree for a prefix code, 
and vice versa, the expected number of checks we make exceeds H{P) + D{P\\Q) 
by as little as possible when the decision tree for our group test has the same 
shape as a minimax tree for {loggi, . . . ,logg„}. In other words. Problem C is 
equivalent to B and, therefore, also takes 0{n) more time than D. 

We are currently studying whether either Drmota and Szpankowski's solu- 
tion to Problem A or our solution to B can give us an intuitive explanation 
of why dynamic Shannon coding has a better worst-case bound than dynamic 
Huffman coding does. On the one hand, worst-case bounds (especially for online 
algorithms; see, e.g., [5]) are often proven by considering a game between the 




H{P) + D{P\\Q) + ^Pi(log% + |ci|) 




= minmaxjlogg^ + \ci\} 



= M(log(?i, . . . ,log(j„) , 
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algorithm and an omniscient adversary, and minimizing the maximum pointwise 
redundancy at each step seems somehow relatcid (more than just by name) to the 
minimax strategy for the algorithm. On the other hand, dynamic prefix coding 
can be viewed as a procedure in which we repeatedly build a prefix code based 
on a sample — i.e., the characters already encoded. 

3 Minimcix Trees for Integer Weights 

In this section we give two 0(n)-time algorithms for building a minimax tree for 

a multiset of integer weights, both based on the following lemma (which we note 
applies to any weights, not only integers) and corollary: 

Lemma 1. IfW = {wi, . . . , Wn} is a multiset of weights and 

W = I max (wi, maxi{wi} — n + l) , . . . , max («;„, maxijwi} — n + l) | , 

then M{W') = M{W). Moreover, any minimax tree for W becomes a minimax 
tree for W when we replace the leaves' weights equal to maxj{«;i} — n + 1 hy the 
weights in W less than or equal to maxi{wj} — n+1, in any order. 

Proof. Consider a minimax tree T for W . Without loss of generality, we can 
assume T is strictly binary — i.e., that every internal node has exactly two 
children — and, therefore, that it has height at most n — 1. (Recall that, for 
simplicity, we consider only binary trees.) If n = 1, then W — wi = maxijwi} — 
n + 1. Otherwise, all the leaves have depth at least 1, so M{W) > maxijwi} + 1. 
Consider any leaf (if one exists) with weight less than max^jwi} — n+1 and 
depth Since max^jwi} — n + 1 + £ < maxijwi} < M(W), increasing that leaf's 
weight to max.i{wi} — n + 1 and updating its ancestors' weights, does not change 
the weight M{W) of the root. It follows that M{W') = M{W). 

Now consider a minimax tree T' for W . If we replace the leaves' weights equal 
to maxi{wj} — n + 1 by the weights in W less than or equal to maxi{wi} — n + 1 
and update all the nodes' weights, then the weight M(W') of the root cannot 
increase nor, by definition, decrease to less than M{W). Since M{W') = M{W), 
it follows that the re- weighted tree is a minimax tree for W. □ 

Corollary 1. When all the weights in W are integers, we can sort W in 0{n) 
time. 

Proof. When all the weights in W at least maxi{wj} — n+1 are integers, all the 
weights in W are integers in the interval [maxj{wj} — n+1, maxjjwi}] . Since 
this interval has length n — 1, we can sort W in 0{n) time using either direct 
addressing, which takes 0{n) extra space, or radix sort, which takes no extra 
space [12]. □ 





For our first algorithm, we build and sort W; build a minimax tree for W 
using a implementation of Golumbic's algorithm that takes 0{n) time when the 
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weights are already sorted; and replace the leaves' weights equal to ma.Xi{wi} — 
n + 1 by the weights in W less than or equal to maxi{wi} — n + 1. We note that 
Van Leeuwen [31] showed how to implement Huffman's algorithm to take 0{n) 
when the weights are already sorted. We could implement Golumbic's algorithm 
analogously, but we think the implementation below is simpler. 

Lemma 2. Golumbic's algorithm can be implemented to take 0{n) time when 
the weights are already sorted. 

Proof. We start with the weights stored in a linked list in nondecreasing order, 
and set a pointer to the head of the list. We then repeat the following procedure 
until there is only one node left in the list, which is the root of a minimax 
tree for the given weights: we move the pointer along the list to the last weight 
less than or equal to the maximum of the first two weights plus 1; remove the 
first two nodes from the list; make those nodes the children of a new node with 
weight equal to the maximum of their weights plus one; and insert the new node 
immediately to the right of the pointer. Notice we remove two nodes for each 
one we insert, so the total number of nodes is 2n — 1. Therefore, since the pointer 
passes over each node once, this implementation takes 0{n) time. □ 

Building and sorting W takes 0{n) time, by Corollary 1; building a minimax 
tree for W' takes 0{ri) time, by Lemma 2; replacing the leaves' weights equal to 
maxijwi} — n + 1 by the weights in W less than or equal to maxjjwi} — n + 1 
takes 0{n) time, because it can be done in any order. By Lemma 1, the resulting 
tree is a minimax tree for W . 

Theorem 1. Given a multiset W of n integer weights, we can build a minimax 
tree for W in 0{n) time. 

Our second algorithm differs in its second step: instead of using Golumbic's 
algorithm to build a minimax tree for W , we use Kirkpatrick and Klawe's 0{n)- 
time algorithm for integer weights to build an alphabetic minimax tree for the 
sequence V consisting of the weights in W in non-increasing order. The algo- 
rithm's correctness follows from the Kraft Inequality: 

Theorem 2 (Kraft, 1949). // there exists a binary tree whose leaves have 
depths .. ,ln, then Yl,i 1/2^* < 1- Conversely, if 1/2^* < 1 and l\ <■ ■■ < 
in, then there exists an ordered binary tree whose leaves, from left to right, have 
depths £i,. . . ,£n- 

By the latter part of Theorem 2 and a standard exchange argument i.e., if a 
minimax tree contains two leaves such that the deeper one has a higher weight 
than the shallower one, then we can swap their weights — there exists a minimax 
tree for W in which the leaves' weights are non-increasing from left to right. 
Therefore, by definition, any alphabetic minimax tree for y is a minimax tree 
for W. 
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4 Mininictx Trees for Real Weights 

Strictly speaking, Drmota and Szpankowski's algorithm works only when given a 
multiset of weights equal to {logpi, . . . , logp„} for some probability distribution 
P = Pi,. . . ,pn- For any value c, however, ifW = {wi, . . . , Wn} and W = {wi + 
c, . . . jWn + c} then, by definition, M{W') = M{W) + c and any minimax tree for 
W becomes a minimax tree for W when we subtract c from each leaf's weight. 
In particular, if c = -log(X;i2"'') then 2""'+'= = 2=X;j2'"- = 1; therefore, 
W = {logpi, . . . , logp„} for some probability distribution P = pi, . . . ,p„ and 
we can use Drmota and Szpankowski's algorithm to build minimax trees for 
W and, thus, for W. Without loss of generality, we henceforth assume the 
given multiset W of weights is equal to {logpi, . . . , logp„} for some probability 
distribution P (so each Wi <0). 

Theorem 3 (Drmota and Szpankowski, 2002). If W = {wi, . . . , w„} is a 

multiset of weights, X = {xi, . . . ,a;„} = {|wi| — Ll^i|J> • • • i l^nl ~ Ll^nlJ} ^'^'^ 
Xi is the largest element in X L) {0} such that 



then any minimax tree for {—l\wj\\ : Xj < Xi} U {—\\wj\~\ : Xj > Xi} becomes 
a minimax tree for W when we replace each leaf's weight — L|wj|J or —flw^l] by 



If xi < ■ ■ ■ < Xn and > then, by Theorem 3, i is the largest index such 
that {[|wj|J : Xj < Xi} [J {WwjW : Xj > Xi} satisfies the Kraft Inequality. To 
build a minimax tree for W with Drmota and Szpankowski's algorithm, we com- 
pute and sort X] use binary search to find i, in each round testing whether the 
Kraft Inequality holds; build a minimax tree for {— [IwijJ [|wi|J , — , 
. . . , — r|w„|]}; and replace each leaf's weight — [|wj|J or — [lifjH by Wj. Our ver- 
sion differs in three ways: we use generalized selection instead of sorting and 
binary search; we use a new data structure to test the Kraft Inequality; and 
we use either of our algorithms from Section 3 to build the minimax tree for 
{— Wwi IJ , . . . , — [|wi|J , — riw^i+i 11 1 • • • I ^ ri""^"!] }• 111 tfie remainder of this section 
we first show how to use generalized selection to find i in 0(n) time, excluding 
the time needed to test the Kraft Inequality; we then show how to perform all 
the necessary tests in a total of 0{n) time on a word RAM with J7(logn)-bit 
words, using our new data structure. Since each of our algorithms from Section 3 
takes 0{n) time, it follows that we can build a minimax tree for W in 0{n) time. 

To find Xi in 0{n) time with general selection, we start with the multiset 
Xi = X U {0} and repeat the following procedure until we reach the empty set: 
in the rth round, we use the linear-time selection algorithm due to Blum et al. [4] 
to find the current multiset X^s median a;„i, then test whether 





■m 
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if so, we remove those elements of Xr that are less than or equal to Xm and 
recurse on the resulting multiset; if not, we remove those elements of Xr that 
are greater than or equal to Xm and recurse. The element is the largest median 
we consider for which the test is positive. Since the size of the multisets decreases 
by a factor of at least 2 in each round, we use 0(log n) rounds and we find all 
the medians in a total of 0{n) time. 

By the same arguments we used to prove Lemma 1, we can assume, without 
loss of generality, that < n — 1 for each j. To test the Kraft Inequality, 

we use a data structure consisting of two n-bit binary fractions, and ^2, 
each broken into (logn)-bit blocks and initially set to 0. For 1 < fc < n — 1, 
adding 1/2^^ to either fraction takes 0(1) amortized time, for the same reason 
that incrementing a binary counter takes 0(1) amortized time (see, e.g., [7, 
Section 17.3]). On a word RAM with i7(log n)-bit words, nondestructively testing 
whether 5i +5*2 < 1 takes 0(n/ log n) time, because adding each corresponding 
pair of blocks takes 0(1) time and, by induction, the number carried from each 
pair to the next is at most 1; resetting either fraction to takes 0(1) time for 
each block, i.e., 0(n/logn) time in total. 

Before starting to search for Xi, we set = 1/2'^!"'^^ in 0{n) time. 
Throughout our generalized selection, we maintain the invariant that, at the 
beginning of the rth round, 

5i = 5]]i/2rKn + J2 i/2n«'^ii 

j 0<a;3<min(Xr) 

and ^2 = 0. In the rth round, we set 

S2= 1/2^™^ 11 

inin{Xr-)<Xj<X7n 

in 0(|Xr|) time. Since 

51 + 52 = ^i/2ri"'^n + J2 1/2 n™^- n + ^ i/2ri».n 

j 0<Xj<Tnin(Xr) inin{Xr)<Xj<Xm. 

= J2 l/2LI"'^-IJ+ J2 1/2 

Xj<Xm Xj>Xm 

wc can test the Kraft Inequality in 0(n/logn) time by checking whether 5*1 + 
5*2 < 1. If the test is positive, then we add 5*2 to Si in 0{n/ logn) time; if the test 
is negative, then we do not change Si . In either case, straightforward calculation 
shows that, afterwards, 

Si=Y^ l/2n«'.n + ^ ii2\\^iW 

j 0<2;,<min(X^-|-i) 

SO the first part of our invariant is maintained. Finally, we reset 5*2 = in 
0(n/logn) time, so the second part of our invariant is maintained. Since \Xr\ = 
0{n/2'^), the rth round takes a total of 0{n/2^ + n/ logn) time. Since X]r>i "/2'' = 
n and we use 0(log n) rounds, it follows that our whole generalized selection takes 
0{n) time. This completes the proof of our main result: 
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Theorem 4. Given a multiset Wofn real weights, we can build a minimax tree 
for W in 0{n) time on a word RAM with Q{\ogn)-hit words. 
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